Right, so if I'm doing the math right you have 2,400 replicas per JVM?
I'm not clear whether each node has a single JVM or not.

Anyway. 2048 is indeed much too high. If nothing else, dropping it to,
say, 64 would show whether this was the real root of your problem or not.
Even if it slowed startup unacceptably, it would show you that this was,
indeed, the problem.

Is this a multi-tenant situation? I'm trying to understand why you need
so many cores. Having 1,200 collections each with 12 shards seems like
massive over-sharding. How many docs exist in each core? I'm
wondering if you've backed yourself into a corner by unnecessary sharding.
If you could, say, reduce your shards per collection to 2 (or even one?) you
might get out of this bind cheaply.

I regularly see 50M docs on a single shard give very good performance
FWIW.

Best,
Erick

On Thu, Dec 15, 2016 at 11:55 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:
> Yes, I changed the value of coreLoadThreads.
>
> With the default value a node takes like 40 minutes to be available with all 
> replicas up.
>
> Right now I have ~1.2K collections with 12 shards each, 2 replicas spread in 
> 12 nodes. Indeed the value I configured maybe is too much (2048) but I can 
> start nodes in 10 minutes.
>
> I need to review the value to something more conservative maybe.
>
> --
>
> /Yago Riveiro
>
> On 15 Dec 2016, 16:43 +0000, Erick Erickson <erickerick...@gmail.com>, wrote:
>> Hmmm, have you changed coreLoadThreads? We had a problem with this a
>> while back with loading lots and lots of cores, see:
>> https://issues.apache.org/jira/browse/SOLR-7280
>>
>> But that was fixed in 6.2, so unless you changed the number of threads
>> used to load cores it shouldn't be a problem on 6.3...
>>
>> The symptom was also that replicas would never change to "active",
>> they'd be stuck in ercovery or down.
>>
>> Best,
>> Erick
>>
>> On Thu, Dec 15, 2016 at 3:07 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:
>> > Hi,
>> >
>> > I'm getting this error in my log
>> >
>> > 12/15/2016, 9:28:18 AM ERROR true ExecutorUtil Uncaught exception
>> > java.lang.StackOverflowError thrown by thread:
>> > coreZkRegister-1-thread-48-processing-n:XXX.XXX.XXX.XXX:8983_solr
>> > x:collection1_shard3_replica2 s:shard3 c:collection1-visitors r:core_node5
>> > java.lang.Exception: Submitter stack trace
>> > at
>> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:204)
>> > at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:204)
>> > at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:505)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> > at
>> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>> > at
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> > at
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> > at java.lang.Thread.run(Thread.java:745)
>> >
>> >
>> >
>> > -----
>> > Best regards
>> > --
>> > View this message in context: 
>> > http://lucene.472066.n3.nabble.com/Uncaught-exception-java-lang-StackOverflowError-in-6-3-0-tp4309849.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to