Hi,
I think there's a bug Solr 9.2.1, that can cause a deadlock when
started. In rare cases, the servlet container startup thread gets
blocked and there's no other thread that could unblock it.
"main" #1 prio=5 os_prio=0 cpu=5922.39ms elapsed=7490.27s
tid=0x00007f637402ae70 nid=0x47 waiting on condition [0x00007f6379488000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.9/Native Method)
- parking to wait for <0x0000000081da8000> (a
java.util.concurrent.CountDownLatch$Sync)
at
java.util.concurrent.locks.LockSupport.park(java.base@17.0.9/Unknown Source)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.9/Unknown
Source)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@17.0.9/Unknown
Source)
at
java.util.concurrent.CountDownLatch.await(java.base@17.0.9/Unknown Source)
at
org.apache.solr.servlet.CoreContainerProvider$ContextInitializationKey.waitForReadyService(CoreContainerProvider.java:523)
at
org.apache.solr.servlet.CoreContainerProvider$ServiceHolder.getService(CoreContainerProvider.java:562)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:148)
at
org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:133)
at
org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:725)
at
org.eclipse.jetty.servlet.ServletHandler$$Lambda$315/0x00007f62fc2674b8.accept(Unknown
Source)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(java.base@17.0.9/Unknown
Source)
at
java.util.stream.Streams$ConcatSpliterator.forEachRemaining(java.base@17.0.9/Unknown
Source)
at
java.util.stream.ReferencePipeline$Head.forEach(java.base@17.0.9/Unknown
Source)
at
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749)
at
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392)
This looks related to changes for
https://issues.apache.org/jira/browse/SOLR-15590
Just guessing, but this could also be causing
https://issues.apache.org/jira/browse/SOLR-16086
I've looked into the code, and the idea seems to be that
ContextInitializationKey.waitForReadyService should have been unblocked
by CoreContainerProvider#init, which is calling
ServiceHolder#setService. This should work because
CoreContainerProvider#init is always called before
SolrDispatchFilter#init (ServletContextListeners are initialized before
Filters).
But there's a problem: CoreContainerProvider#init stores the
ContextInitializationKey and the mapped ServiceHolder in
CoreContainerProvider#services, and that's a *WeakHashMap*:
services
.computeIfAbsent(new
ContextInitializationKey(servletContext), ServiceHolder::new)
.setService(this);
The key is not referenced anywhere else, which makes the mapping a
candidate for garbage collection. The ServiceHolder value also does not
reference the key anymore, because #setService cleared the reference.
With bad luck, the mapping is already gone from the WeakHashMap before
SolrDispatchFilter#init tries to retrieve it with
CoreContainerProvider#serviceForContext. And that method will then
create a new ContextInitializationKey and ServiceHolder, which is then
used for #waitForReadyService. But such a new ContextInitializationKey
has never received a #makeReady call, and #waitForReadyService will
block forever.
Do you think that makes sense? Or did I miss something?
I'll create a JIRA ticket in the next days, if that's okay.
Best
Andreas