Hi,

we're running tests on a stand-alone Solr instance, which create Solr cores from multiple applications using CoreAdmin (via SolrJ).

Lately, we upgraded from 8.4.1 to 8.6.3, and sometimes we now see a LockObtainFailedException for a lock held by the same JVM, after which Solr is broken and runs into NullPointerExceptions for simple CoreAdmin STATUS requests. We have to restart Solr then. I've never seen this with 8.4.1 or previous releases.

This bug is quite severe for us because it breaks our system tests with Solr, and we fear that it may also happen in production. Is this a known bug?

Our applications use a CoreAdmin STATUS request to check whether a core exists, followed by a CREATE request, if the core does not exist. With multiple applications, and bad timing, two concurrent CREATE requests for the same core are of course still possible. Solr 8.4.1 rejected duplicate requests and logged ERRORs but kept working correctly [1]. I can still see the same log messages in 8.6.3 ("Core with name ... already exists" or "Error CREATEing SolrCore ... Could not create a new core in ... as another core is already defined there") - but sometimes also the following error, after which Solr is broken:

2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [   ] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error 
CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core 
[blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual machine: 
/var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock

        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312)
        at 
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95)
        at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
...
Caused by: org.apache.solr.common.SolrException: Unable to create core 
[blueprint_acgqqafsogyc_comments]
        at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273)
        ... 47 more
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1071)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:906)
        at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387)
        ... 48 more
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308)
        at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1012)
        ... 50 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by this 
virtual machine: 
/var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
        at 
org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139)
        at 
org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41)
        at 
org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45)
        at 
org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:785)
        at 
org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:126)
        at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100)
        at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:261)
        at 
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:135)
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2145)

2020-10-27 00:29:25.353 INFO  (qtp2029754983-19) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null 
path=/admin/cores 
params={core=blueprint_acgqqafsogyc_comments&action=STATUS&indexInfo=false&wt=javabin&version=2}
 status=500 QTime=0
2020-10-27 00:29:25.353 ERROR (qtp2029754983-19) [   ] o.a.s.s.HttpSolrCall 
null:org.apache.solr.common.SolrException: Error handling 'STATUS' action
        at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:372)
        at 
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)
        at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
...
Caused by: java.lang.NullPointerException
        at org.apache.solr.core.SolrCore.getInstancePath(SolrCore.java:333)
        at 
org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:329)
        at org.apache.solr.handler.admin.StatusOp.execute(StatusOp.java:54)
        at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)


Any ideas? Were there any recent changes since 8.4.1 that could have caused this?

Thank you,
Andreas


[1] BTW, it would be nice if Solr would support an atomic request "create core if not exists" so that I wouldn't need separate STATUS/CREATE requests and could avoid ERRORs in the log. But this is not critical, and not the problem here. (I asked for this years ago: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201606.mbox/%3C5762B310.60600%40coremedia.com%3E )


Reply via email to