Re: SolrCloud scaling/optimization for high request rate

Sofiya Strochyk Mon, 29 Oct 2018 01:39:01 -0700

Erick,

thanks, i've been pulling my hair out over this for a long time andgathered a lot of information :)

Doesn't there exist a setting for maxIndexingThreads in solrconfig withdefault value of about 8? It's not clear if my updates are beingexecuted in parallel or not but i would expect them to use at least afew threads.

In the past, we hosted 2 shards on one of the bigger nodes for sometime, and this resulted in high load on that node and slow requests fromthose 2 shards (though not too much worse than now with only 1 shard pernode) so they might be too small for handling 2 or more replicas.

Anyway thanks for your help, i'll try profiling and looking into metricsand see if there are some pointers to CPU consumption...



On 27.10.18 05:52, Erick Erickson wrote:

Sofiya:

I haven't said so before, but it's a great pleasure to work with
someone who's done a lot of homework before pinging the list. The only
unfortunate bit is that it usually means the simple "Oh, I can fix
that without thinking about it much" doesn't work ;)

2.  I'll clarify a bit here. Any TLOG replica can become the leader.
Here's the process for an update:

doc comes in to the leader (may be TLOG)
doc is forwarded to all TLOG replicas, _but it is not indexed there_.
If the leader fails, the other TLOG replicas have enough documents in _their_ tlogs to 
"catch up" and one is elected
You're totally right that PULL replicas cannot become leaders
having all TLOG replicas means that the CPU cycles otherwise consumed by 
indexing are available for query processing.

The point here is that TLOG replicas don't need to expend CPU cycles
to index documents, freeing up all those cycles for serving queries.

Now, that said you report that QPS rate doesn't particularly seem to
be affected by whether you're indexing or not, so that makes using
TLOG and PULL replicas less likely to solve your problem. I was
thinking about your statement that you index as fast as possible....


6. This is a little surprising. Here's my guess: You're  indexing in
large batches and the batch is only really occupying a thread or two
so it's effectively serialized thus not consuming a huge amount of
resources.

So unless G1 really solves a lot of problems, more replicas are
indicated. On machines with large amounts of RAM and lots of CPUs, one
other option is to run multiple JVMs per physical node that's
sometimes helpful.

One other possibility. In Solr 7.5, you have a ton of metrics
available. If you hit the admin/metrics end point you'll see 150-200
available metrics. Apart from running  a profiler to see what's
consuming the most cycles, the metrics can give you a view into what
Solr is doing and may help you pinpoint what's using the most cycles.

Best,
Erick
On Fri, Oct 26, 2018 at 12:23 PM Toke Eskildsen <t...@kb.dk> wrote:

David Hastings <hastings.recurs...@gmail.com> wrote:

Would adding the docValues in the schema, but not reindexing, cause
errors?  IE, only apply the doc values after the next reindex, but in the
meantime keep functioning as there were none until then?

As soon as you specify in the schema that a field has docValues=true, Solr 
treats all existing documents as having docValues enabled for that field. As 
there is no docValue content, DocValues-aware functionality such as sorting and 
faceting will not work for that field, until the documents has been re-indexed.

- Toke Eskildsen


--
Email Signature
*Sofiia Strochyk
*


s...@interlogic.com.ua <mailto:s...@interlogic.com.ua>
        InterLogic
www.interlogic.com.ua <https://www.interlogic.com.ua>

Facebook icon <https://www.facebook.com/InterLogicOfficial> LinkedInicon <https://www.linkedin.com/company/interlogic>

Re: SolrCloud scaling/optimization for high request rate

Reply via email to