[ 
https://issues.apache.org/jira/browse/SOLR-14452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097138#comment-17097138
 ] 

David Smiley commented on SOLR-14452:
-------------------------------------

I think your master branch is out of date.  March 17th in SOLR-14256 I fixed 
this bug which had been around for a month.

> "classloading deadlock" issue with DocSet/SortedIntDocSet
> ---------------------------------------------------------
>
>                 Key: SOLR-14452
>                 URL: https://issues.apache.org/jira/browse/SOLR-14452
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> While beasting some facet related cloud tests on master, I noticed a pattern 
> of occasional failures that seemed to crop up...
>  * test ultimately fails due to a time out (usually the client threads time 
> out waiting for a server response)
>  * if i notice my CPU isn't spinning very hard _before_ the test fails, I can 
> capture a jstack and inspect some threads
>  * there will be multiple jetty/solr request threads (ex: 
> {{"qtp82184175-145"}} whose stack traces show various stages of DocSet 
> collection that show they are {{"... in Object.wait()"}} but also {{RUNNABLE}}
> ...this isn't a thread summary+state combination that i'm use to seeing when 
> looking at thread dumps, and some research into when/why this might happen 
> lead me to:
>  * 
> [https://stackoverflow.com/questions/28631656/runnable-thread-state-but-in-object-wait]
>  ** [https://stackoverflow.com/a/28776438/689372]
>  *** 
>  **** 
> [http://ternarysearch.blogspot.com/2013/07/static-initialization-deadlock.html]
>  **** [https://bugs.openjdk.java.net/browse/JDK-8037567]
> ...while the comments/status of JDK-8037567 suggests "nothing wrong here" the 
> overall symptoms/description of the problem in the SO answer and linked blog 
> and summation that this is essentially a "deadlock" situation in the class 
> loader, do seem to correlate to some of the specifics I can see in the stack 
> traces when this happens while running solr tests...
>  * at least one "RUNNABLE / Object.wait" thread trying to do class init; 
> class: DocSet...
> {noformat}
> "qtp1535326437-68" #68 prio=5 os_prio=0 cpu=72.48ms elapsed=241.69s 
> tid=0x00007fc08c0a4000 nid=0x864 in Object.wait()  [0x00007fc0adedd000]
>    java.lang.Thread.State: RUNNABLE
>       at org.apache.solr.search.DocSet.<clinit>(DocSet.java:118)
>       at 
> org.apache.solr.search.DocSetCollector.getDocSet(DocSetCollector.java:90) // 
> "new BitDocSet(..)"
>       at org.apache.solr.search.DocSetUtil.getDocSet(DocSetUtil.java:93)
>       at 
> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1730)
> {noformat}
>  * other "RUNNABLE / Object.wait" threads are on lines that involve 
> instantiating a subclass of DocSet:
>  ** 
> {noformat}
> "qtp1535326437-67" #67 prio=5 os_prio=0 cpu=801.44ms elapsed=241.69s 
> tid=0x00007fc08c0a1800 nid=0x863 in Object.wait()  [0x00007fc0adfdf000]
>    java.lang.Thread.State: RUNNABLE
>       at 
> org.apache.solr.search.DocSetCollector.getDocSet(DocSetCollector.java:90) // 
> "new BitDocSet(..)"
>       at org.apache.solr.search.DocSetUtil.getDocSet(DocSetUtil.java:93)
>       at 
> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1730)
> {noformat}
>  ** 
> {noformat}
> "qtp82184175-65" #65 prio=5 os_prio=0 cpu=137.76ms elapsed=241.69s 
> tid=0x00007fc088092000 nid=0x860 in Object.wait()  [0x00007fc0ae2e2000]
>    java.lang.Thread.State: RUNNABLE
>       at 
> org.apache.solr.search.DocSetCollector.getDocSet(DocSetCollector.java:84) // 
> "new SortedIntDocSet(..)"
>       at org.apache.solr.search.DocSetUtil.getDocSet(DocSetUtil.java:93)
>       at 
> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1730)
>       at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1433)
> {noformat}
>  ** etc...
>  * DocSet has a static reference to a concrete subclass...
>  ** {{public static final DocSet EMPTY = new SortedIntDocSet(new int[0], 0);
> ----
> I should point out:
> * While this particular "class loading deadlock" issue seems more likely to 
> happen in a "test" situation where the JVMs/classloaders are short lived, 
> there's no reason to assume this type of failure couldn't happen in a 
> production solr instance when handling a burst of queries right after startup.
> * This type of failure (either specifically due to "DocSet vs 
> SortedIntDocSet", or due to similar patterns in other classes) may also be 
> the root cause of various other hard to reproduce "timed out" test failures 
> we've seen over the years.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to