Re: SolrClould 6.6 stability challenges

2017-11-05 Thread Rick Dig
hi Emir - the document size would be an average of less than 1.5kb. it is actually 2000 queries / min - queries are primarily autocomplete + highlighting (on a multivalued field with different payloads), search and faceting . what should we watch for that would indicate that we are overloading

Re: SolrClould 6.6 stability challenges

2017-11-05 Thread Rick Dig
hi Shawn, all, answers inline. also, another discovery, not sure if completely useful. even when we increase the autocommit values to say an hour, the nodes go "down" in 10-15 minutes. so either we are doing something wrong with autocommit settings and commits are continuing to happen frequently

Re: SolrClould 6.6 stability challenges

2017-11-05 Thread Erick Erickson
Check the leader and follower logs for anything like "leader initiated recovery" (LIR). One thing I have seen where followers go into recovery is if, for some reason, the time it takes to respond to an update exceeds the timeout. The scenario is this: > leader sends an update > follower fails to

Re: SolrClould 6.6 stability challenges

2017-11-05 Thread Shawn Heisey
On 11/3/2017 10:15 PM, Rick Dig wrote: we are trying to run solrcloud 6.6 in a production setting. here's our config and issue 1) 3 nodes, 1 shard, replication factor 3 2) all nodes are 16GB RAM, 4 core 3) Our production load is about 2000 requests per minute 4) index is fairly small, index size

Re: SolrClould 6.6 stability challenges

2017-11-05 Thread Emir Arnautović
Hi Rick, I quickly looked at GC logs and didn’t see obvious issues. You mentioned that batch processing takes ~20s and it is 500 documents. With 5-7 indexing thread it is ~150 documents/s. Are those big documents? With 200 queries/min (~3-4 queries/s - what sort of queries?) and 5-7 indexing

Re: SolrClould 6.6 stability challenges

2017-11-04 Thread Rick Dig
not committing after the batch. made sure we have that turned off. maxTime is set to 30 (300 seconds), openSearcher is set to true. On Sat, Nov 4, 2017 at 6:50 PM, Amrit Sarkar wrote: > Pretty much what Emir has stated. I want to know, when you saw; > > all of this

Re: SolrClould 6.6 stability challenges

2017-11-04 Thread Amrit Sarkar
Pretty much what Emir has stated. I want to know, when you saw; all of this runs perfectly ok when indexing isn't happening. as soon as > we start "nrt" indexing one of the follower nodes goes down within 10 to 20 > minutes. When you say "NRT" indexing, what is the commit strategy in indexing.

Re: SolrClould 6.6 stability challenges

2017-11-04 Thread Emir Arnautović
Hi Rick, Do you see any errors in logs? Do you have any monitoring tool? Maybe you can check heap and GC metrics around time when incident happened. It is not large heap but some major GC could cause pause large enough to trigger some snowball and end up with node in recovery state. What is

SolrClould 6.6 stability challenges

2017-11-03 Thread Rick Dig
hello all, we are trying to run solrcloud 6.6 in a production setting. here's our config and issue 1) 3 nodes, 1 shard, replication factor 3 2) all nodes are 16GB RAM, 4 core 3) Our production load is about 2000 requests per minute 4) index is fairly small, index size is around 400 MB with 300k