One mechanism that comes to mind is if the swapping slows down an update.

Here's the process
- Leader sends doc to follower
- follower times out
- leader says "that replica must be sick, I'll tell it to recover"

The smoking gun here is if you see any messages about
"leader-initiated recovery". grep for leader and initiated, because
some of the text is inconsistent, may be "leader initiated" or
"leader-initiated". And grep on both leader and follower.

On the leader you should also see messages in the log about the update
timing out.

The follower won't have any errors at all; the doc indexed
successfully it just took a long time.

This is a total shot in the dark BTW. I'm rather surprised that it
takes 3 or more hours for 200K docs. If they're really big and you're
sending them in batches you may just be taking a while to process the
batch and swapping may have nothing to do with it. Perhaps smaller
batches would help if that's the case.

Long GC pauses can also cause this to happen BTW, ZooKeeper will
periodically ping the node to see if it's up. If it times out ZK can
cause the node to go into recovery (actually, I think the leader gets
a message and puts the follower into recovery). Examining the GC logs
should tell you whether that's possible. 3-4 second stop-the-world GC
pauses (and those are excessive IMO) shouldn't hurt. Although with
heaps that small I wouldn't expect much in the way of stop-the-world
GC pauses.

Best,
Erick

On Fri, Dec 15, 2017 at 10:29 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 12/15/2017 10:53 AM, Bill Oconnor wrote:
>> The recovering server has a much larger swap usage than the other servers in 
>> the cluster. We think this this related to the mmap files used for indexes. 
>> The server eventually recovers but it triggers alerts for devops which are 
>> annoying.
>>
>> I have found a previous mail  list question (Shawn responded to) with almost 
>> an identical problem from 2014 but there is no suggested remedy. ( 
>> http://lucene.472066.n3.nabble.com/Solr-4-3-1-memory-swapping-td4126641.html)
>
> Solr itself cannot influence swap usage.  This is handled by the
> operating system.  I have no idea whether Java can influence swap usage,
> but if it can, this is also outside our control and I have no idea what
> to tell you.  My guess is that Java is unlikely to influence swap usage,
> but only a developer on the JDK team could tell you that for sure.
>
> Assuming we're dealing with a Linux machine, my recommendation would be
> to set vm.swappiness to 0 or 1, so that the OS is not aggressively
> deciding that data should be swapped.  The default for vm.swappiness on
> Linux is 60, which is quite aggressive.
>
>> Questions :
>>
>> Is there progress regarding this?
>
> As mentioned previously, there's nothing Solr or Lucene can do to help,
> because it's software completely outside our ability to influence.  The
> operating system will make those decisions.
>
> If your software tries to use more memory than the machine has, then
> swap is going to get used no matter how the OS is configured, and when
> that happens, performance will suffer greatly.  In the case of
> SolrCloud, it would make basic operation go so slowly that timeouts
> would get exceeded, and Solr would initiate recovery.
>
> If the OS is Linux, I would like to see a screenshot from the "top"
> program.  (not htop, or anything else, the program needs to be top).
> Run the program, press shift-M to sort the list by memory, and grab a
> screenshot.
>
> Thanks,
> Shawn
>

Reply via email to