Re: Solr 4.6.1 Cloud Stops Replication

Rallavagu Mon, 24 Aug 2015 02:48:26 -0700

As a follow up, the default is set to "NRTCachingDirectoryFactory" forDirectoryFactory but not MMapDirectory. It is mentioned thatNRTCachingDirectoryFactory "caches small files in memory for better NRTperformance".

Wondering if the this would also consume physical memory to the amountof MMap directory. Thoughts?


On 8/18/15 9:29 AM, Erick Erickson wrote:

Couple of things:

1> Here's an excellent backgrounder for MMapDirectory, which is
what makes it appear that Solr is consuming all the physical memory....

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

2> It's possible that your transaction log was huge. Perhaps not likely,
but possible. If Solr abnormally terminates (kill -9 is a prime way to do this),
then upon restart the transaction log is replayed. This log is rolled over upon
every hard commit (openSearcher true or false doesn't matter). So, in the
scenario where you are indexing a whole lot of stuff without committing, then
it can take a very long time to replay the log. Not only that, but as you do
replay the log, any incoming updates are written to the end of the tlog.. That
said, nothing in your e-mails indicates this could be a problem and it's
frankly not consistent with the errors you _do_ report but I thought
I'd mention it.
See:
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
You can avoid the possibility of this by configuring your autoCommit interval
to be relatively short (say 60 seconds) with openSearcher=false....

3> ConcurrentUpdateSolrServer isn't the best thing for bulk loading SolrCloud,
CloudSolrServer (renamed CloudSolrClient in 5.x) is better. CUSS sends all
the docs to some node, and from there that node figures out which
shard each doc belongs on and forwards the doc (actually in batches) to the
appropriate leader. So doing what you're doing creates a lot of cross chatter
amongst nodes. CloudSolrServer/Client figures that out on the client side and
only sends packets to each leader that consist of only the docs the belong on
that shard. You can getnearly linear throughput with increasing numbers of
shards this way.

Best,
Erick

On Tue, Aug 18, 2015 at 9:03 AM, Rallavagu <rallav...@gmail.com> wrote:

Thanks Shawn.

All participating cloud nodes are running Tomcat and as you suggested will
review the number of threads and increase them as needed.

Essentially, what I have noticed was that two of four nodes caught up with
"bulk" updates instantly while other two nodes took almost 3 hours to
completely in sync with "leader". I have "tickled" other nodes by sending an
update thinking that it would initiate the replication but not sure if that
caused other two nodes to eventually catch up.

On similar note, I was using "CouncurrentUpdateSolrServer" directly pointing
to leader to bulk load Solr cloud. I have configured the chunk size and
thread count for the same. Is this the right practice to bulk load
SolrCloud?

Also, the maximum number of connections per host parameter for
"HttpShardHandler" is in solrconfig.xml I suppose?

Thanks



On 8/18/15 8:28 AM, Shawn Heisey wrote:


On 8/18/2015 8:18 AM, Rallavagu wrote:


Thanks for the response. Does this cache behavior influence the delay
in catching up with cloud? How can we explain solr cloud replication
and what are the option to monitor and take proactive action (such as
initializing, pausing etc) if needed?



I don't know enough about your setup to speculate.

I did notice this exception in a previous reply:

org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for
connection from pool

I can think of two things that would cause this.

One cause is that your servlet container is limiting the number of
available threads.  A typical jetty or tomcat default for maxThreads is
200, which can easily be exceeded by a small Solr install, especially if
it's SolrCloud.  The jetty included with Solr sets maxThreads to 10000,
which is effectively unlimited except for extremely large installs.  If
you are providing your own container, this will almost certainly need to
be raised.

The other cause is that your install is extremely busy and you have run
out of available HttpClient connections.  The solution in this case is
to increase the maximum number of connections per host in the
HttpShardHandler config, which defaults to 20.


https://wiki.apache.org/solr/SolrConfigXml#Configuration_of_Shard_Handlers_for_Distributed_searches

There might be other causes for that exception, but I think those are
the most common causes.  Depending on how things are set up, you have
problems with both.

Thanks,
Shawn

Re: Solr 4.6.1 Cloud Stops Replication

Reply via email to