Re: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

Erick Erickson Sat, 10 Mar 2018 08:45:07 -0800

There are quite a number of reasons you may be seeing this, all having
to do with trying to put too much stuff in too little hardware.

Yes, an increasing number of collections will spin up an increasing
number of threads. Also your Jetty container has a limited thread pool
may be a factor. Or you may simply be starting to swap memory on/off
disk. Or you're hitting GC pauses. Or..... Given your statement that
speed decreases 50x, my suspicion is disk swapping, but that's a total
guess.

The key here is that you are hosting N replicas per JVM, and (maybe) M
JVMs per physical machine. It's unreasonable to assume that you can
keep adding more and more and more collections/replicas per JVM and
not eventually hit the limits of your hardware. And you mention that
indexing slows down, which leads me to assume that you're adding more
and more docs to each of these collections. This will eventually
simply blow up.

Here's a long blog on the topic:
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

There are two options that spring to mind:
1> get more hardware and distribute the replicas over more hardware
2> consider combining collections. So instead of having one collection
per user, say, have many users in the same collection and keep them
from seeing each other's data by adding the appropriate "fq" clause.

At any rate, there's no a-priori limit to the number of
collections/replicas/whatever that Solr can deal with, the limits are
your hardware.

Best,
Erick

On Fri, Mar 9, 2018 at 9:52 PM, 苗海泉 <mseaspr...@gmail.com> wrote:
> hello,We found a problem. In solr 6.0, the indexing speed of solr is
> influenced by the number of solr collections. The speed is normal before
> the limit is reached. If the limit is reached, the indexing speed will
> decrease by 50 times.
>
> In our environment, there are 49 solr nodes. If each collection has 25
> shards, you can maintain high-speed indexing until the total number of
> collections is about 900. To reduce the number of collections to the limit,
> the speed will increase. Go up.
> If each collection is 49 shards, the total number of collections can only
> be about 700, exceeding this value will cause the index to drop
> dramatically.
> In the explanation, we are single copies, and multiple copies will cause
> serious stability problems in the large solr cluster environment.
>
> At first I suspect that it was due to too many thread submissions, and
> there are still problems with this method, so I'm inclined to
> searcherExecutor thread pool thread. This is just my guess, I want to know
> the real reason. Can someone know if I can help?
>
> Also, I noticed that the searcherExecutor thread and solr collection's
> shards basically correspond to each other. How can I reduce the number of
> threads or even close it? Although there are many collections in our
> environment, there are few queries and it is not necessary to keep the
> threads open to provide queries. This is too wasteful.
>
> thank you .

Re: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

Reply via email to