[ 
https://issues.apache.org/jira/browse/SOLR-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-5691:
---------------------------------

    Assignee: Mark Miller

> Unsynchronized WeakHashMap in SolrDispatchFilter causing issues in SolrCloud
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-5691
>                 URL: https://issues.apache.org/jira/browse/SOLR-5691
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.6.1
>            Reporter: Bojan Smid
>            Assignee: Mark Miller
>             Fix For: 5.0, 4.7
>
>
> I have a large SolrCloud setup, 7 nodes, each hosting few 1000 cores 
> (leaders/replicas of same shard exist on different nodes), which is maybe 
> making it easier to notice the problem.
> Node can randomly get into a state where it "stops" responding to PeerSync 
> /get requests from other nodes. When that happens, threaddump of that node 
> shows multiple entries like this one (one entry for each "blocked" request 
> from other node; they don't go away with time):
> "http-bio-8080-exec-1781" daemon prio=5 tid=0x440177200000 nid=0x25ae  [ JVM 
> locked by VM at safepoint, polling bits: safep ]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.WeakHashMap.get(WeakHashMap.java:471)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
>         at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>         at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> WeakHashMap's internal state can easily get corrupted when used in 
> unsynchronized way, in which case it is known to enter infinite loop in 
> .get() call. It is very likely that this happens here too. The reason why 
> other maybe don't see this issue could be related to huge number of cores I 
> have in this system. The problem is usually created when some node is 
> starting. Also, it doesn't happen with each start, it obviously depends on 
> "correct" timing of events which lead to map's corruption.
> The fix may be as simple as changing:
> protected final Map<SolrConfig, SolrRequestParsers> parsers = new 
> WeakHashMap<SolrConfig, SolrRequestParsers>();
> to:
>   protected final Map<SolrConfig, SolrRequestParsers> parsers = 
> Collections.synchronizedMap(
>       new WeakHashMap<SolrConfig, SolrRequestParsers>());
> but there may be performance considerations around this since it is entrance 
> into Solr.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to