[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261732#comment-15261732 ] Stephan Lagraulet commented on SOLR-6406: - Hi [~yo...@apache.org] did you make any progress on this issue? > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller >Assignee: Yonik Seeley > Fix For: 5.5, master > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch, > SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009016#comment-15009016 ] Yonik Seeley commented on SOLR-6406: So increasing maxConnectionsPerHost didn't fix the problem. I instrumented the ConcurrentUpdateSolrServer to try and understand what is happening when, and am analyzing some of those fails now. They are all beyond the max size that can be uploaded to JIRA though, so I'lll just put up a summary based on what I find. > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller >Assignee: Yonik Seeley > Fix For: 5.4, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch, > SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001429#comment-15001429 ] Yonik Seeley commented on SOLR-6406: I was analyzing another "shards-out-of-sync" failure on trunk. It looks like that certain update are just not being forwarded from the leader to a certain replica. Working theory: the max connections per host of the HttpClient is being hit, starving updates from certain update threads. This could account for why shutdownNow on the update executor service is having such an impact. In an orderly shutdown, all scheduled jobs will still be run (I think), which means that connections will be released, and the updates that were being starved will get to proceed. But it's for exactly this reason that we should probably keep the shutdownNow... it mimics much better what will happen in real world situations when a node goes down. >From this, it looks like max connections per host is 20: {code} 13404 INFO (TEST-HdfsChaosMonkeyNothingIsSafeTest.test-seed#[A22375CC545D2B82]) [] o.a.s.h.c.HttpShardHandlerFactory created with socketTimeout : 9,urlScheme : ,connTimeout : 15000,maxConnectionsPerHost : 20,maxConnections : 1,corePoolSize : 0,maximumPoolSize : 2147483647,maxThreadIdleTime : 5,sizeOfQueue : -1,fairnessPolicy : false,useRetries : false, {code} The test used 12 nodes (and 2 shards)... increasing the chance of hitting the max connections (since all nodes run on the same host). > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller >Assignee: Yonik Seeley > Fix For: 5.4, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch, > SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997248#comment-14997248 ] Mark Miller commented on SOLR-6406: --- Strange. I got over 300 runs without an out of sync with it originally. I have not tried on recent trunk or recent changes though. > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller >Assignee: Yonik Seeley > Fix For: 5.4, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch, > SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995263#comment-14995263 ] Yonik Seeley commented on SOLR-6406: The other variants: 4) trunk with only client changes reverted (i.e. DUH check enabled, shutdownNow used) 5) trunk with client + DUH changes reverted (i.e. only shutdownNow enabled) 6) trunk with alternate client changes (only changes to blockUntilFinished) 7) trunk with alternate client changes 2 (only changes to blockUntilFinished, but using former isTerminated instead of isShutdown) I managed to get inconsistent shard runs with all of these. The common element is shutdownNow being used on the shard update executor. > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller >Assignee: Yonik Seeley > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch, > SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991961#comment-14991961 ] Yonik Seeley commented on SOLR-6406: Update: I looped 3 tests overnight... 1) trunk with shutdownNow on the update executor reverted 2) trunk with shutdownNow and the client changes in this patch reverted 3) plain trunk Only #3 resulted in inconsistent shards. I'm setting up some new variants to test now... > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller >Assignee: Yonik Seeley > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch, > SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985389#comment-14985389 ] ASF subversion and git services commented on SOLR-6406: --- Commit 1712045 from [~yo...@apache.org] in branch 'dev/trunk' [ https://svn.apache.org/r1712045 ] SOLR-6406: fix race/hang in ConcurrentUpdateSolrClient.blockUntilFinished when executor service is shut down > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch, > SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985392#comment-14985392 ] ASF subversion and git services commented on SOLR-6406: --- Commit 1712047 from [~yo...@apache.org] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1712047 ] SOLR-6406: fix race/hang in ConcurrentUpdateSolrClient.blockUntilFinished when executor service is shut down > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller >Assignee: Yonik Seeley > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch, > SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984535#comment-14984535 ] Yonik Seeley commented on SOLR-6406: OK, so I haven't hit any hangs with this latest patch and shutdownNow() on the associated executor. Interestingly enough though, this still results in inconsistent shard failures. My guess is that the shutdown of the executor is done as one of the last steps in CoreContainer.shutdown(), which still gives time for streaming update requests to continue streaming. > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch, > SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983761#comment-14983761 ] Mark Miller commented on SOLR-6406: --- Ha, that's actually close to the first hack fix I made. Occasionally waking up in the wait and checking if empty again. > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch, > SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980772#comment-14980772 ] Mark Miller commented on SOLR-6406: --- Just two threads stuck - not necessarily from the same client. Previously I had only ever seen 1 thread stuck. Just noting it, may not mean much. > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980812#comment-14980812 ] Yonik Seeley commented on SOLR-6406: OK, I was able to reproduce... Interestingly, this is pretty easy to hit (and I also saw 2 threads stuck at the same point... which as you say must be 2 different client objects). There must be something more here than a subtle/little race condition. > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974518#comment-14974518 ] Yonik Seeley commented on SOLR-6406: bq. Got a hang at the same spot - 2 threads stuck on it this time rather than the usual 1: Hmmm, yeah, I only fixed one spot where runners are submitted. I'll try taking another crack at it... Although having 2 threads stuck at the same place in blockUntilFinished should be impossible... it's a synchronized method. > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973243#comment-14973243 ] Mark Miller commented on SOLR-6406: --- I'm testing Yonik's approach today and will see if it resolves this. My quick patch does not FWIW. > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973270#comment-14973270 ] Mark Miller commented on SOLR-6406: --- Got a hang at the same spot - 2 threads stuck on it this time rather than the usual 1: {noformat} [junit4]>2) Thread[id=1071, name=qtp612828486-1071, state=WAITING, group=TGRP-HdfsChaosMonkeyNothingIsSafeTest] [junit4]> at java.lang.Object.wait(Native Method) [junit4]> at java.lang.Object.wait(Object.java:502) [junit4]> at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient.blockUntilFinished(ConcurrentUpdateSolrClient.java:418) [junit4]> at org.apache.solr.update.StreamingSolrClients.blockUntilFinished(StreamingSolrClients.java:106) [junit4]> at org.apache.solr.update.SolrCmdDistributor.blockAndDoRetries(SolrCmdDistributor.java:231) [junit4]> at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:89) [junit4]> at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:778) [junit4]> at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1622) [junit4]> at org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:183) [junit4]> at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) [junit4]> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:151) [junit4]> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2079) [junit4]> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:667) [junit4]> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460) [junit4]> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) [junit4]> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) {noformat} > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972845#comment-14972845 ] Mark Miller commented on SOLR-6406: --- Ha - hadn't refreshed my browser. I'll review this approach. > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png, SOLR-6406.patch, SOLR-6406.patch > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972813#comment-14972813 ] Mark Miller commented on SOLR-6406: --- A more recent set of stack traces: {noformat} [junit4] ERROR 0.00s | HdfsChaosMonkeyNothingIsSafeTest (suite) <<< [junit4]> Throwable #1: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=39 closes=38 [junit4]>at __randomizedtesting.SeedInfo.seed([71608A03B4692CB]:0) [junit4]>at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:468) [junit4]>at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:234) [junit4]>at java.lang.Thread.run(Thread.java:745)Throwable #2: com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from SUITE scope at org.apache.solr.cloud.hdfs.HdfsChaosMonkeyNothingIsSafeTest: [junit4]>1) Thread[id=243, name=qtp487431535-243, state=WAITING, group=TGRP-HdfsChaosMonkeyNothingIsSafeTest] [junit4]> at java.lang.Object.wait(Native Method) [junit4]> at java.lang.Object.wait(Object.java:502) [junit4]> at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient.blockUntilFinished(ConcurrentUpdateSolrClient.java:404) [junit4]> at org.apache.solr.update.StreamingSolrClients.blockUntilFinished(StreamingSolrClients.java:103) [junit4]> at org.apache.solr.update.SolrCmdDistributor.blockAndDoRetries(SolrCmdDistributor.java:231) [junit4]> at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:89) [junit4]> at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:778) [junit4]> at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1622) [junit4]> at org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:183) [junit4]> at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) [junit4]> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:151) [junit4]> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2079) [junit4]> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:667) [junit4]> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460) [junit4]> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) [junit4]> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) [junit4]> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [junit4]> at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:109) [junit4]> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [junit4]> at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) [junit4]> at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:300) [junit4]> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [junit4]> at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [junit4]> at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) [junit4]> at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) [junit4]> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [junit4]> at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [junit4]> at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) [junit4]> at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [junit4]> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [junit4]> at org.eclipse.jetty.server.Server.handle(Server.java:499) [junit4]> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) [junit4]> at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) [junit4]> at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [junit4]> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) [junit4]> at
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972821#comment-14972821 ] Yonik Seeley commented on SOLR-6406: OK, here's one theory after a quick look: {code} } finally { synchronized (runners) { if (runners.size() == 1 && !queue.isEmpty()) { // keep this runner alive scheduler.execute(this); } else { runners.remove(this); if (runners.isEmpty()) runners.notifyAll(); } } {code} What if the queue isn't empty, so we try to do "scheduler.execute", but the scheduler has been shut down? That will throw an exception and the else block containing notifyAll() will never be executed. > ConcurrentUpdateSolrServer hang in blockUntilFinished. > -- > > Key: SOLR-6406 > URL: https://issues.apache.org/jira/browse/SOLR-6406 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Fix For: 5.0, Trunk > > Attachments: CPU Sampling.png > > > Not sure what is causing this, but SOLR-6136 may have taken us a step back > here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest > now - test fails because of a thread leak, thread leak is due to a > ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping > up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701450#comment-14701450 ] Stephan Lagraulet commented on SOLR-6406: - I have attached a cpu sample of a solr cloud server which has very poor update performance since a few hours. I guess it could be related to this problem. ConcurrentUpdateSolrServer hang in blockUntilFinished. -- Key: SOLR-6406 URL: https://issues.apache.org/jira/browse/SOLR-6406 Project: Solr Issue Type: Bug Reporter: Mark Miller Fix For: 5.0, Trunk Attachments: CPU Sampling.png Not sure what is causing this, but SOLR-6136 may have taken us a step back here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest now - test fails because of a thread leak, thread leak is due to a ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297238#comment-14297238 ] Timothy Potter commented on SOLR-6406: -- Would it make sense to change the code to wait(timeoutMs) and we can recheck the state of things and going back to waiting if it makes sense vs. the indefinite way you're seeing? ConcurrentUpdateSolrServer hang in blockUntilFinished. -- Key: SOLR-6406 URL: https://issues.apache.org/jira/browse/SOLR-6406 Project: Solr Issue Type: Bug Reporter: Mark Miller Fix For: 5.0, Trunk Not sure what is causing this, but SOLR-6136 may have taken us a step back here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest now - test fails because of a thread leak, thread leak is due to a ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297242#comment-14297242 ] Mark Miller commented on SOLR-6406: --- Mabye. I've been trying to spot how it can happen (without a runner also still going, which I don't see). So far, I cannot spot how it happens. ConcurrentUpdateSolrServer hang in blockUntilFinished. -- Key: SOLR-6406 URL: https://issues.apache.org/jira/browse/SOLR-6406 Project: Solr Issue Type: Bug Reporter: Mark Miller Fix For: 5.0, Trunk Not sure what is causing this, but SOLR-6136 may have taken us a step back here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest now - test fails because of a thread leak, thread leak is due to a ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297214#comment-14297214 ] Mark Miller commented on SOLR-6406: --- I still see this happen in tests. This hangs at runners.wait(); and no notify or anything comes and it's just an ugly hang. ConcurrentUpdateSolrServer hang in blockUntilFinished. -- Key: SOLR-6406 URL: https://issues.apache.org/jira/browse/SOLR-6406 Project: Solr Issue Type: Bug Reporter: Mark Miller Fix For: 5.0, Trunk Not sure what is causing this, but SOLR-6136 may have taken us a step back here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest now - test fails because of a thread leak, thread leak is due to a ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping up recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106906#comment-14106906 ] Mark Miller commented on SOLR-6406: --- {noformat} 1) Thread[id=55, name=qtp823025155-55, state=WAITING, group=TGRP-ChaosMonkeyNothingIsSafeTest] at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:374) at org.apache.solr.update.StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:103) at org.apache.solr.update.SolrCmdDistributor.blockAndDoRetries(SolrCmdDistributor.java:228) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:89) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:766) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1662) at org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) {noformat} ConcurrentUpdateSolrServer hang in blockUntilFinished. -- Key: SOLR-6406 URL: https://issues.apache.org/jira/browse/SOLR-6406 Project: Solr Issue Type: Bug Reporter: Mark Miller Fix For: 5.0, 4.11 Not sure what is causing this, but SOLR-6136 may have taken us a step back here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest now - test fails because of a thread leak, thread leak is due to a ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping up recently. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6406) ConcurrentUpdateSolrServer hang in blockUntilFinished.
[ https://issues.apache.org/jira/browse/SOLR-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106909#comment-14106909 ] Mark Miller commented on SOLR-6406: --- This was on a nightly of ChaosMonkeyNothingIsSafeTest. It's fairly rare. ConcurrentUpdateSolrServer hang in blockUntilFinished. -- Key: SOLR-6406 URL: https://issues.apache.org/jira/browse/SOLR-6406 Project: Solr Issue Type: Bug Reporter: Mark Miller Fix For: 5.0, 4.11 Not sure what is causing this, but SOLR-6136 may have taken us a step back here. I see this problem occasionally pop up in ChaosMonkeyNothingIsSafeTest now - test fails because of a thread leak, thread leak is due to a ConcurrentUpdateSolrServer hang in blockUntilFinished. Only started popping up recently. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org