Hi,

after reading some Solr source code, I might have found the cause:

There was indeed a change in Solr 8.6 that leads to the NullPointerException
for the CoreAdmin STATUS request in CoreAdminOperation#getCoreStatus. The
instancePath is not retrieved from the ResourceLoader anymore, but from the
registered CoreDescriptor. See commit [1]. 

SolrCore.getInstancePath(SolrCore.java:333) throws an NPE because the
CoreContainer does not have a CoreDescriptor for the name, even though a
SolrCore is available in the CoreContainer under that name (retrieved some
lines above). This inconsistency is persistent: All STATUS requests keep
failing until Solr is restarted.

IIUC, the underlying problem is that CoreContainer#create does not correctly
handle concurrent requests to create the same core. There's a race condition
(see TODO comment [2]), and CoreContainer#createFromDescriptor may be called
subsequently for the same core. The second call then fails to create an
IndexWriter (LockObtainFailedException), and this causes a call to
SolrCores#removeCoreDescriptor [3]. This mean, the second call removes the
CoreDescriptor for the SolrCore created with the first call. This is the
inconsistency that causes the NPE in CoreAdminOperation#getCoreStatus.

Does this sound reasonable?

I'll create a JIRA ticket tomorrow, if that's okay.

Thank you,
Andreas

[1]
https://github.com/apache/lucene-solr/commit/17ae79b0905b2bf8635c1b260b30807cae2f5463#diff-9652fe8353b7eff59cd6f128bb2699d88361e670b840ee5ca1018b1bc45584d1R324
[2]
https://github.com/apache/lucene-solr/blob/15241573d3c8da0db3dfd380d99e4efcfe500c2e/solr/core/src/java/org/apache/solr/core/CoreContainer.java#L1242
[3]
https://github.com/apache/lucene-solr/blob/15241573d3c8da0db3dfd380d99e4efcfe500c2e/solr/core/src/java/org/apache/solr/core/CoreContainer.java#L1407




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to