[ 
https://issues.apache.org/jira/browse/SOLR-14505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14505:
------------------------------------
    Description: 
Reported by [~cjcowie] in SOLR-13072 & mailing lists:

"Running on Solr 8.3.1
{code:java}
2020-05-19 03:44:40.220 INFO  (main) [   ] o.a.s.c.ZkContainer Zookeeper 
client=XXXXXXXXX:9983/xxxx_cluster
2020-05-19 03:44:40.238 INFO  (main) [   ] o.a.s.c.c.SolrZkClient Using 
ZkCredentialsProvider: 
xxxxxxx.zookeeper.auth.internal.EncodedZkCredentialsProvider
2020-05-19 03:44:40.241 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Waiting 
for client to connect to ZooKeeper
2020-05-19 03:44:40.245 INFO  (zkConnectionManagerCallback-11-thread-1) [   ] 
o.a.s.c.c.ConnectionManager zkClient has connected
2020-05-19 03:44:40.245 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client 
is connected to ZooKeeper
2020-05-19 03:44:40.359 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Waiting 
for client to connect to ZooKeeper
2020-05-19 03:44:40.361 INFO  (zkConnectionManagerCallback-13-thread-1) [   ] 
o.a.s.c.c.ConnectionManager zkClient has connected
2020-05-19 03:44:40.361 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client 
is connected to ZooKeeper
2020-05-19 03:44:40.417 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updated live 
nodes from ZooKeeper... (0) -> (1)
2020-05-19 03:44:56.606 INFO  (zkCallback-12-thread-2) [   ] 
o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0)
2020-05-19 03:44:56.614 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could not 
start Solr. Check solr/home property and the logs
2020-05-19 03:44:56.639 ERROR (main) [   ] o.a.s.c.SolrCore 
null:java.lang.NullPointerException
at 
org.apache.solr.cloud.ZkController.lambda$registerLiveNodesListener$10(ZkController.java:1020)
at 
org.apache.solr.common.cloud.ZkStateReader.registerLiveNodesListener(ZkStateReader.java:880)
at 
org.apache.solr.cloud.ZkController.registerLiveNodesListener(ZkController.java:1035)
at org.apache.solr.cloud.ZkController.init(ZkController.java:917)
at org.apache.solr.cloud.ZkController.<init>(ZkController.java:473)
at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:115)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:631)at 
xxxxxxx.solr.servlet.RobustSolrDispatchFilter.createCoreContainer(RobustSolrDispatchFilter.java:71)
 {code}
I couldn't find any bug reports in JIRA for the NPE.
  
 Here's the full log
 [https://drive.google.com/open?id=1hQrF25blNgKLXijOMYJ30wn-Lfy6uKVm]
  
 The NPE is coming from 
[https://github.com/apache/lucene-solr/blob/c18666ad05afc02979c150aacd4810cff02e43f3/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L1020]
  
 _byte[] json = Utils.toJSON(Collections.singletonMap("timestamp", 
cloudManager.getTimeSource().getEpochTimeNs()));_ 
 so I don't know whether it's the cloudManager or the time source that's null.
 That bit of the ZkController was added by 
https://issues.apache.org/jira/browse/SOLR-13072 and I see it is only hit if
 
_zkStateReader.getAutoScalingConfig().hasTriggerForEvents(TriggerEventType.NODELOST);_
  
 We have never (knowingly) configured autoscaling and we don't use it, but I 
see the autoscaling files are present in ZK. Is the autoscaling.json etc 
created by default when it is absent in ZooKeeper?
  
 The interesting bit of the log above, aside from the NPE, is this:
 _2020-05-19 03:44:40.417 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updated 
live nodes from ZooKeeper... (0) -> (1)_
 _2020-05-19 03:44:56.606 INFO  (zkCallback-12-thread-2) [   ] 
o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0)_
 Which suggests to me that there's either a race condition or the problem is 
caused by some zookeeper outage during startup. Since there's a 16 second gap 
between those messages.
  
 It's possible that the problem is in some way caused by our own code in 
xxxxxxx.solr.servlet.RobustSolrDispatchFilter which wraps the 
SolrDispatchFIlter, and creates a SolrZkClient in a try/with resources (so 
should be autoclosed), but that all happens before createCoreContainer is 
called."

  was:
Reported by [~cjcowie] in SOLR-13072 & mailing lists:

"Running on Solr 8.3.1
{code:java}
2020-05-19 03:44:40.220 INFO  (main) [   ] o.a.s.c.ZkContainer Zookeeper 
client=XXXXXXXXX:9983/xxxx_cluster
2020-05-19 03:44:40.238 INFO  (main) [   ] o.a.s.c.c.SolrZkClient Using 
ZkCredentialsProvider: 
xxxxxxx.zookeeper.auth.internal.EncodedZkCredentialsProvider
2020-05-19 03:44:40.241 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Waiting 
for client to connect to ZooKeeper
2020-05-19 03:44:40.245 INFO  (zkConnectionManagerCallback-11-thread-1) [   ] 
o.a.s.c.c.ConnectionManager zkClient has connected
2020-05-19 03:44:40.245 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client 
is connected to ZooKeeper
2020-05-19 03:44:40.359 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Waiting 
for client to connect to ZooKeeper
2020-05-19 03:44:40.361 INFO  (zkConnectionManagerCallback-13-thread-1) [   ] 
o.a.s.c.c.ConnectionManager zkClient has connected
2020-05-19 03:44:40.361 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client 
is connected to ZooKeeper
2020-05-19 03:44:40.417 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updated live 
nodes from ZooKeeper... (0) -> (1)
2020-05-19 03:44:56.606 INFO  (zkCallback-12-thread-2) [   ] 
o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0)
2020-05-19 03:44:56.614 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could not 
start Solr. Check solr/home property and the logs
2020-05-19 03:44:56.639 ERROR (main) [   ] o.a.s.c.SolrCore 
null:java.lang.NullPointerException
at 
org.apache.solr.cloud.ZkController.lambda$registerLiveNodesListener$10(ZkController.java:1020)
at 
org.apache.solr.common.cloud.ZkStateReader.registerLiveNodesListener(ZkStateReader.java:880)
at 
org.apache.solr.cloud.ZkController.registerLiveNodesListener(ZkController.java:1035)
at org.apache.solr.cloud.ZkController.init(ZkController.java:917)
at org.apache.solr.cloud.ZkController.<init>(ZkController.java:473)
at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:115)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:631)at 
xxxxxxx.solr.servlet.RobustSolrDispatchFilter.createCoreContainer(RobustSolrDispatchFilter.java:71)
 {code}
I couldn't find any bug reports in JIRA for the NPE.
 
Here's the full log
[https://drive.google.com/open?id=1hQrF25blNgKLXijOMYJ30wn-Lfy6uKVm]
 
The NPE is coming from 
[https://github.com/apache/lucene-solr/blob/c18666ad05afc02979c150aacd4810cff02e43f3/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L1020]
 
_byte[] json = Utils.toJSON(Collections.singletonMap("timestamp", 
cloudManager.getTimeSource().getEpochTimeNs()));_ 
so I don't know whether it's the cloudManager or the time source that's null.
That bit of the ZkController was added by 
https://issues.apache.org/jira/browse/SOLR-13072 and I see it is only hit if
_zkStateReader.getAutoScalingConfig().hasTriggerForEvents(TriggerEventType.NODELOST);_
 
We have never (knowingly) configured autoscaling and we don't use it, but I see 
the autoscaling files are present in ZK. Is the autoscaling.json etc created by 
default when it is absent in ZooKeeper?
 
The interesting bit of the log above, aside from the NPE, is this:
_2020-05-19 03:44:40.417 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updated 
live nodes from ZooKeeper... (0) -> (1)_
_2020-05-19 03:44:56.606 INFO  (zkCallback-12-thread-2) [   ] 
o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0)_
Which suggests to me that there's either a race condition or the problem is 
caused by some zookeeper outage during startup. Since there's a 16 second gap 
between those messages.
 
It's possible that the problem is in some way caused by our own code in 
xxxxxxx.solr.servlet.RobustSolrDispatchFilter which wraps the 
SolrDispatchFIlter, and creates a SolrZkClient in a try/with resources (so 
should be autoclosed), but that all happens before createCoreContainer is 
called.


> Intermittent NPE in ZkController.registerLiveNodesListener()
> ------------------------------------------------------------
>
>                 Key: SOLR-14505
>                 URL: https://issues.apache.org/jira/browse/SOLR-14505
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 8.3.1
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>            Priority: Minor
>
> Reported by [~cjcowie] in SOLR-13072 & mailing lists:
> "Running on Solr 8.3.1
> {code:java}
> 2020-05-19 03:44:40.220 INFO  (main) [   ] o.a.s.c.ZkContainer Zookeeper 
> client=XXXXXXXXX:9983/xxxx_cluster
> 2020-05-19 03:44:40.238 INFO  (main) [   ] o.a.s.c.c.SolrZkClient Using 
> ZkCredentialsProvider: 
> xxxxxxx.zookeeper.auth.internal.EncodedZkCredentialsProvider
> 2020-05-19 03:44:40.241 INFO  (main) [   ] o.a.s.c.c.ConnectionManager 
> Waiting for client to connect to ZooKeeper
> 2020-05-19 03:44:40.245 INFO  (zkConnectionManagerCallback-11-thread-1) [   ] 
> o.a.s.c.c.ConnectionManager zkClient has connected
> 2020-05-19 03:44:40.245 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client 
> is connected to ZooKeeper
> 2020-05-19 03:44:40.359 INFO  (main) [   ] o.a.s.c.c.ConnectionManager 
> Waiting for client to connect to ZooKeeper
> 2020-05-19 03:44:40.361 INFO  (zkConnectionManagerCallback-13-thread-1) [   ] 
> o.a.s.c.c.ConnectionManager zkClient has connected
> 2020-05-19 03:44:40.361 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client 
> is connected to ZooKeeper
> 2020-05-19 03:44:40.417 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updated 
> live nodes from ZooKeeper... (0) -> (1)
> 2020-05-19 03:44:56.606 INFO  (zkCallback-12-thread-2) [   ] 
> o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0)
> 2020-05-19 03:44:56.614 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could 
> not start Solr. Check solr/home property and the logs
> 2020-05-19 03:44:56.639 ERROR (main) [   ] o.a.s.c.SolrCore 
> null:java.lang.NullPointerException
> at 
> org.apache.solr.cloud.ZkController.lambda$registerLiveNodesListener$10(ZkController.java:1020)
> at 
> org.apache.solr.common.cloud.ZkStateReader.registerLiveNodesListener(ZkStateReader.java:880)
> at 
> org.apache.solr.cloud.ZkController.registerLiveNodesListener(ZkController.java:1035)
> at org.apache.solr.cloud.ZkController.init(ZkController.java:917)
> at org.apache.solr.cloud.ZkController.<init>(ZkController.java:473)
> at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:115)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:631)at 
> xxxxxxx.solr.servlet.RobustSolrDispatchFilter.createCoreContainer(RobustSolrDispatchFilter.java:71)
>  {code}
> I couldn't find any bug reports in JIRA for the NPE.
>   
>  Here's the full log
>  [https://drive.google.com/open?id=1hQrF25blNgKLXijOMYJ30wn-Lfy6uKVm]
>   
>  The NPE is coming from 
> [https://github.com/apache/lucene-solr/blob/c18666ad05afc02979c150aacd4810cff02e43f3/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L1020]
>   
>  _byte[] json = Utils.toJSON(Collections.singletonMap("timestamp", 
> cloudManager.getTimeSource().getEpochTimeNs()));_ 
>  so I don't know whether it's the cloudManager or the time source that's null.
>  That bit of the ZkController was added by 
> https://issues.apache.org/jira/browse/SOLR-13072 and I see it is only hit if
>  
> _zkStateReader.getAutoScalingConfig().hasTriggerForEvents(TriggerEventType.NODELOST);_
>   
>  We have never (knowingly) configured autoscaling and we don't use it, but I 
> see the autoscaling files are present in ZK. Is the autoscaling.json etc 
> created by default when it is absent in ZooKeeper?
>   
>  The interesting bit of the log above, aside from the NPE, is this:
>  _2020-05-19 03:44:40.417 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updated 
> live nodes from ZooKeeper... (0) -> (1)_
>  _2020-05-19 03:44:56.606 INFO  (zkCallback-12-thread-2) [   ] 
> o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (0)_
>  Which suggests to me that there's either a race condition or the problem is 
> caused by some zookeeper outage during startup. Since there's a 16 second gap 
> between those messages.
>   
>  It's possible that the problem is in some way caused by our own code in 
> xxxxxxx.solr.servlet.RobustSolrDispatchFilter which wraps the 
> SolrDispatchFIlter, and creates a SolrZkClient in a try/with resources (so 
> should be autoclosed), but that all happens before createCoreContainer is 
> called."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to