Re: Some nodes have all the load

Shawn Heisey Mon, 11 Mar 2013 16:48:26 -0700

On 3/11/2013 3:52 PM, jimtronic wrote:

The load test was fairly heavy (ie lots of users) and designed to mimic a
fully operational system with lots of users doing normal things.


There were two things I gleaned from the logs:

PERFORMANCE WARNING: Overlapping onDeckSearchers=2 appeared for several of
my more active cores

and

The non-leaders were throwing errors saying that the leader as not
responding while trying to forward updates. (sorry can't find that specific
error now)

My best guess is that it has something to do with the commits.

  a. frequent user generated writes using
/update?commitWithin=500&waitFlush=false&waitSearcher=false
  b. softCommit set to 3000
  c. autoCommit set to 300,000 and openSearcher false
  d. I'm also doing frequent periodic DIH updates. I guess this is
commit=true by default.

Should I omit commitWithin and set DIH to commit=false and just let soft and
autocommit do their jobs?

I've just locate a previous message on this list from Mark Miller sayingthat in Solr 4, commitWithin is a soft commit.

You should definitely wait for Mark or another committer to verify whatI'm saying in the small novel I am writing below.

My personal opinion is that you should have frequent soft commits (auto,manual, commitWithin, or some combination) along with less frequent (butnot infrequent) autoCommit with openSearcher=false. The autoCommit(which is a hard commit) does two things - ensures that the transactionlogs do not grow out of control, and persists changes to disk. If youhave auto soft commits and updateLog is enabled, I would say that youare pretty safe using commit=false on your DIH updates.

If Mark agrees with what I have said, and your config/schema checks outOK with expected norms, you may be running into bugs. It might also bea case of not enough CPU/RAM resources for the system load. You neverresponded in another thread with the output of the 'free' command, orthe size of your indexes. Putting 13 busy Solr cores onto one box isoverkill, unless the machine has 16-32 CPU cores *and* plenty of fastRAM to cache all your indexes in the OS disk cache. Based on whatyou're saying here and in the other thread, you probably need a javaheap size of 4GB or 8GB, heavily tuned JVM garbage collection options,and depending on the size of your indexes, 16GB may not be enough totalsystem RAM.

IMHO, you should not use trunk (5.0) for anything that you plan to oneday run in production. Trunk is very volatile, large-scale changessometimes get committed with only minimal testing. The dev branch namedbranch_4x (currently 4.3) is kept reasonably stable almost all of thetime. Version 4.2 has just been released - it is already available onthe faster mirrors and there should be a release announcement within aday from now.

If this is not being set up in anticipation for a production deployment,then trunk would be fine, but bugs are to be expected. If the sameproblems do not happen in 4.2 or branch_4x, then I would move thediscussion to the dev list.


Thanks,
Shawn

Re: Some nodes have all the load

Reply via email to