[jira] [Commented] (SOLR-14505) Intermittent NPE in ZkController.registerLiveNodesListener()

2020-05-21 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113008#comment-17113008
 ] 

Andrzej Bialecki commented on SOLR-14505:
-

:D ok, let's close this one.

> Intermittent NPE in ZkController.registerLiveNodesListener()
> 
>
> Key: SOLR-14505
> URL: https://issues.apache.org/jira/browse/SOLR-14505
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3.1
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
>
> Reported by [~cjcowie] in SOLR-13072 & mailing lists:
> "Running on Solr 8.3.1
> {code:java}
> 2020-05-19 03:44:40.220 INFO  (main) [   ] o.a.s.c.ZkContainer Zookeeper 
> client=X:9983/_cluster
> 2020-05-19 03:44:40.238 INFO  (main) [   ] o.a.s.c.c.SolrZkClient Using 
> ZkCredentialsProvider: 
> xxx.zookeeper.auth.internal.EncodedZkCredentialsProvider
> 2020-05-19 03:44:40.241 INFO  (main) [   ] o.a.s.c.c.ConnectionManager 
> Waiting for client to connect to ZooKeeper
> 2020-05-19 03:44:40.245 INFO  (zkConnectionManagerCallback-11-thread-1) [   ] 
> o.a.s.c.c.ConnectionManager zkClient has connected
> 2020-05-19 03:44:40.245 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client 
> is connected to ZooKeeper
> 2020-05-19 03:44:40.359 INFO  (main) [   ] o.a.s.c.c.ConnectionManager 
> Waiting for client to connect to ZooKeeper
> 2020-05-19 03:44:40.361 INFO  (zkConnectionManagerCallback-13-thread-1) [   ] 
> o.a.s.c.c.ConnectionManager zkClient has connected
> 2020-05-19 03:44:40.361 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client 
> is connected to ZooKeeper
> 2020-05-19 03:44:40.417 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updated 
> live nodes from ZooKeeper... (0) -> (1)
> 2020-05-19 03:44:56.606 INFO  (zkCallback-12-thread-2) [   ] 
> o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0)
> 2020-05-19 03:44:56.614 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could 
> not start Solr. Check solr/home property and the logs
> 2020-05-19 03:44:56.639 ERROR (main) [   ] o.a.s.c.SolrCore 
> null:java.lang.NullPointerException
> at 
> org.apache.solr.cloud.ZkController.lambda$registerLiveNodesListener$10(ZkController.java:1020)
> at 
> org.apache.solr.common.cloud.ZkStateReader.registerLiveNodesListener(ZkStateReader.java:880)
> at 
> org.apache.solr.cloud.ZkController.registerLiveNodesListener(ZkController.java:1035)
> at org.apache.solr.cloud.ZkController.init(ZkController.java:917)
> at org.apache.solr.cloud.ZkController.(ZkController.java:473)
> at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:115)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:631)at 
> xxx.solr.servlet.RobustSolrDispatchFilter.createCoreContainer(RobustSolrDispatchFilter.java:71)
>  {code}
> I couldn't find any bug reports in JIRA for the NPE.
>   
>  Here's the full log
>  [https://drive.google.com/open?id=1hQrF25blNgKLXijOMYJ30wn-Lfy6uKVm]
>   
>  The NPE is coming from 
> [https://github.com/apache/lucene-solr/blob/c18666ad05afc02979c150aacd4810cff02e43f3/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L1020]
>   
>  _byte[] json = Utils.toJSON(Collections.singletonMap("timestamp", 
> cloudManager.getTimeSource().getEpochTimeNs()));_ 
>  so I don't know whether it's the cloudManager or the time source that's null.
>  That bit of the ZkController was added by 
> https://issues.apache.org/jira/browse/SOLR-13072 and I see it is only hit if
>  
> _zkStateReader.getAutoScalingConfig().hasTriggerForEvents(TriggerEventType.NODELOST);_
>   
>  We have never (knowingly) configured autoscaling and we don't use it, but I 
> see the autoscaling files are present in ZK. Is the autoscaling.json etc 
> created by default when it is absent in ZooKeeper?
>   
>  The interesting bit of the log above, aside from the NPE, is this:
>  _2020-05-19 03:44:40.417 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updated 
> live nodes from ZooKeeper... (0) -> (1)_
>  _2020-05-19 03:44:56.606 INFO  (zkCallback-12-thread-2) [   ] 
> o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0)_
>  Which suggests to me that there's either a race condition or the problem is 
> caused by some zookeeper outage during startup. Since there's a 16 second gap 
> between those messages.
>   
>  It's possible that the problem is in some way caused by our own code in 
> xxx.solr.servlet.RobustSolrDispatchFilter which wraps the 
> SolrDispatchFIlter, and creates a SolrZkClient in a try/with resources (so 
> should be autoclosed), but that all happens before createCoreContainer is 
> called."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SOLR-14505) Intermittent NPE in ZkController.registerLiveNodesListener()

2020-05-21 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113005#comment-17113005
 ] 

Colvin Cowie commented on SOLR-14505:
-

Oh dear, another race condition: I created 
https://issues.apache.org/jira/browse/SOLR-14504 a little while ago :)

> Intermittent NPE in ZkController.registerLiveNodesListener()
> 
>
> Key: SOLR-14505
> URL: https://issues.apache.org/jira/browse/SOLR-14505
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3.1
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
>
> Reported by [~cjcowie] in SOLR-13072 & mailing lists:
> "Running on Solr 8.3.1
> {code:java}
> 2020-05-19 03:44:40.220 INFO  (main) [   ] o.a.s.c.ZkContainer Zookeeper 
> client=X:9983/_cluster
> 2020-05-19 03:44:40.238 INFO  (main) [   ] o.a.s.c.c.SolrZkClient Using 
> ZkCredentialsProvider: 
> xxx.zookeeper.auth.internal.EncodedZkCredentialsProvider
> 2020-05-19 03:44:40.241 INFO  (main) [   ] o.a.s.c.c.ConnectionManager 
> Waiting for client to connect to ZooKeeper
> 2020-05-19 03:44:40.245 INFO  (zkConnectionManagerCallback-11-thread-1) [   ] 
> o.a.s.c.c.ConnectionManager zkClient has connected
> 2020-05-19 03:44:40.245 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client 
> is connected to ZooKeeper
> 2020-05-19 03:44:40.359 INFO  (main) [   ] o.a.s.c.c.ConnectionManager 
> Waiting for client to connect to ZooKeeper
> 2020-05-19 03:44:40.361 INFO  (zkConnectionManagerCallback-13-thread-1) [   ] 
> o.a.s.c.c.ConnectionManager zkClient has connected
> 2020-05-19 03:44:40.361 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client 
> is connected to ZooKeeper
> 2020-05-19 03:44:40.417 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updated 
> live nodes from ZooKeeper... (0) -> (1)
> 2020-05-19 03:44:56.606 INFO  (zkCallback-12-thread-2) [   ] 
> o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0)
> 2020-05-19 03:44:56.614 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could 
> not start Solr. Check solr/home property and the logs
> 2020-05-19 03:44:56.639 ERROR (main) [   ] o.a.s.c.SolrCore 
> null:java.lang.NullPointerException
> at 
> org.apache.solr.cloud.ZkController.lambda$registerLiveNodesListener$10(ZkController.java:1020)
> at 
> org.apache.solr.common.cloud.ZkStateReader.registerLiveNodesListener(ZkStateReader.java:880)
> at 
> org.apache.solr.cloud.ZkController.registerLiveNodesListener(ZkController.java:1035)
> at org.apache.solr.cloud.ZkController.init(ZkController.java:917)
> at org.apache.solr.cloud.ZkController.(ZkController.java:473)
> at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:115)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:631)at 
> xxx.solr.servlet.RobustSolrDispatchFilter.createCoreContainer(RobustSolrDispatchFilter.java:71)
>  {code}
> I couldn't find any bug reports in JIRA for the NPE.
>   
>  Here's the full log
>  [https://drive.google.com/open?id=1hQrF25blNgKLXijOMYJ30wn-Lfy6uKVm]
>   
>  The NPE is coming from 
> [https://github.com/apache/lucene-solr/blob/c18666ad05afc02979c150aacd4810cff02e43f3/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L1020]
>   
>  _byte[] json = Utils.toJSON(Collections.singletonMap("timestamp", 
> cloudManager.getTimeSource().getEpochTimeNs()));_ 
>  so I don't know whether it's the cloudManager or the time source that's null.
>  That bit of the ZkController was added by 
> https://issues.apache.org/jira/browse/SOLR-13072 and I see it is only hit if
>  
> _zkStateReader.getAutoScalingConfig().hasTriggerForEvents(TriggerEventType.NODELOST);_
>   
>  We have never (knowingly) configured autoscaling and we don't use it, but I 
> see the autoscaling files are present in ZK. Is the autoscaling.json etc 
> created by default when it is absent in ZooKeeper?
>   
>  The interesting bit of the log above, aside from the NPE, is this:
>  _2020-05-19 03:44:40.417 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updated 
> live nodes from ZooKeeper... (0) -> (1)_
>  _2020-05-19 03:44:56.606 INFO  (zkCallback-12-thread-2) [   ] 
> o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0)_
>  Which suggests to me that there's either a race condition or the problem is 
> caused by some zookeeper outage during startup. Since there's a 16 second gap 
> between those messages.
>   
>  It's possible that the problem is in some way caused by our own code in 
> xxx.solr.servlet.RobustSolrDispatchFilter which wraps the 
> SolrDispatchFIlter, and creates a SolrZkClient in a try/with resources (so 
> should be autoclosed), but that all happens before createCoreContainer is 
> called."



--
This message was sent by Atlassian Jira