Andrzej Bialecki  created SOLR-12075:
----------------------------------------

             Summary: TestLargeCluster is too flaky
                 Key: SOLR-12075
                 URL: https://issues.apache.org/jira/browse/SOLR-12075
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: AutoScaling
            Reporter: Andrzej Bialecki 
            Assignee: Andrzej Bialecki 


This test is failing a lot in jenkins builds, with two types of failures:
 * specific test method failures - this may be caused by either bugs in the 
autoscaling code, bugs in the simulator or timing issues. It should be possible 
to narrow down the cause by using different speeds of simulated time.
 * suite-level failures due to leaked threads - most of these failures indicate 
the ongoing Policy calculations, eg:
{code}
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.autoscaling.sim.TestLargeCluster: 
  1) Thread[id=21406, name=AutoscalingActionExecutor-7277-thread-1, 
state=RUNNABLE, group=TGRP-TestLargeCluster]
       at java.util.ArrayList.iterator(ArrayList.java:834)
       at org.apache.solr.common.util.Utils.getDeepCopy(Utils.java:131)
       at org.apache.solr.common.util.Utils.makeDeepCopy(Utils.java:110)
       at org.apache.solr.common.util.Utils.getDeepCopy(Utils.java:92)
       at org.apache.solr.common.util.Utils.makeDeepCopy(Utils.java:108)
       at org.apache.solr.common.util.Utils.getDeepCopy(Utils.java:92)
       at org.apache.solr.common.util.Utils.getDeepCopy(Utils.java:74)
       at org.apache.solr.client.solrj.cloud.autoscaling.Row.copy(Row.java:91)
       at 
org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.lambda$getMatrixCopy$1(Policy.java:297)
       at 
org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session$$Lambda$466/1757323495.apply(Unknown
 Source)
       at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
       at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
       at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
       at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
       at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
       at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
       at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
       at 
org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.getMatrixCopy(Policy.java:298)
       at 
org.apache.solr.client.solrj.cloud.autoscaling.Policy$Session.copy(Policy.java:287)
       at 
org.apache.solr.client.solrj.cloud.autoscaling.Row.removeReplica(Row.java:156)
       at 
org.apache.solr.client.solrj.cloud.autoscaling.MoveReplicaSuggester.tryEachNode(MoveReplicaSuggester.java:60)
       at 
org.apache.solr.client.solrj.cloud.autoscaling.MoveReplicaSuggester.init(MoveReplicaSuggester.java:34)
       at 
org.apache.solr.client.solrj.cloud.autoscaling.Suggester.getSuggestion(Suggester.java:129)
       at 
org.apache.solr.cloud.autoscaling.ComputePlanAction.process(ComputePlanAction.java:98)
       at 
org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$null$3(ScheduledTriggers.java:307)
       at 
org.apache.solr.cloud.autoscaling.ScheduledTriggers$$Lambda$439/951218654.run(Unknown
 Source)
       at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
       at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$9/1677458082.run(Unknown
 Source)
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
        at __randomizedtesting.SeedInfo.seed([C6FA0364D13DAFCC]:0)
{code}
It's possible that somewhere an InterruptedException is caught and not 
propagated so that the Policy calculations don't terminate when the thread is 
interrupted when closing parent components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to