On 4/9/2018 12:15 PM, John Blythe wrote: > we're starting to dive into master/slave replication architecture. we'll > have 1 master w 4 slaves behind it. our app is NRT. if user performs an > action in section A's data they may choose to jump to section B which will > be dependent on having the updates from their action in section A. as such, > we're thinking that the replication time should be set to 1-2s (the chances > of them arriving at section B quickly enough to catch the 2s gap is highly > unlikely at best).
Once you start talking about master-slave replication, my assumption is that you're not running SolrCloud. You would NOT want to try and mix SolrCloud with replication. The features do not play well together. SolrCloud with NRT replicas (this is the only replica type that exists in 6.x and earlier) may be a better option than master-slave replication. > since the replicas will simply be looking for new files it seems like this > would be a lightweight operation even every couple seconds for 4 replicas. > that said, i'm going *entirely* off of assumption at this point and wanted > to check in w you all to see any nuances, gotchas, hidden landmines, etc. > that we should be considering before rolling things out. Most of the time, you'd be correct to think that indexing is going to create a new small segment and replication will have little work to do. But as you create more and more segments, eventually Lucene is going to start merging those segments. For discussion purposes, I'm going to describe a situation where each new segment during indexing is about 100KB in size, and the merge policy is left at the default settings. I'm also going to assume that no documents are getting deleted or reindexed (which will delete the old version). Deleted documents can have an impact on merging, but it will usually only be a dramatic impact if there are a LOT of deleted documents. The first ten segments created will be this 100KB size. Then Lucene is going to see that there are enough segments to trigger the merge policy - it's going to combine ten of those segments into one that's approximately one megabyte. Repeat this ten times, and ten of those 1 megabyte segments will be combined into one ten megabyte segment. Repeat all of THAT ten times, and there will be a 100 megabyte segment. And there will eventually be another level creating 1 gigabyte segments. If the index is below 5GB in size, the entire thing *could* be merged into one segment by this process. The end result of all this: Replication is not always going to be super-quick. If merging creates a 1 gigabyte segment, then the amount of time to transfer that new segment is going to depend on how fast your disks are, and how fast your network is. If you're using commodity SATA drives in the 4 to 10 terabyte range and a gigabit network, the network is probably going to be the bottleneck -- assuming that the system has plenty of memory and isn't under a high load. If the network is the bottleneck in that situation, it's probably going to take close to ten seconds to transfer a 1GB segment, and the greater part of a minute to transfer a 5GB segment, which is the biggest one that the default merge policy configuration will create without an optimize operation. Also, you should understand something that has come to my attention recently (and is backed up by documentation): If the master does a soft commit and the segment that was committed remains in memory (not flushed to disk), that segment will NOT be replicated to the slaves. It has to get flushed to disk before it can be replicated. Thanks, Shawn