great. thanks, erick! -- John Blythe
On Wed, Apr 11, 2018 at 12:16 PM, Erick Erickson <erickerick...@gmail.com> wrote: > bq: are you simply flagging the fact that we wouldn't direct the queries > to A > v. B v. C since SolrCloud will make the decisions itself as to which part > of the distro gets hit for the operation > > Yep. SolrCloud takes care of it all itself. I should also add that there > are > about a zillion metrics now available in Solr that you can use to make the > best use of hardware, including things like CPU usage, I/O, GC etc. > SolrCloud > doesn't _yet_ make use of these but will in future. The current software LB > does a pretty simple round-robin distribution. > > Best, > Erick > > On Wed, Apr 11, 2018 at 5:57 AM, John Blythe <johnbly...@gmail.com> wrote: > > thanks, erick. great info. > > > > although you can't (yet) direct queries to one or the other. So just > making > >> them all NRT and forgetting about it is reasonable. > > > > > > are you simply flagging the fact that we wouldn't direct the queries to A > > v. B v. C since SolrCloud will make the decisions itself as to which part > > of the distro gets hit for the operation? if not, can you expound on > this a > > bit more? > > > > The very nature of merging is such that you will _always_ get large > merges > >> until you have 5G segments (by default) > > > > > > bummer > > > > Quite possible, but you have to route things yourself. But in that case > >> you're limited to one machine to handle all your NRT traffic. I skimmed > >> your post so don't know whether your NRT traffic load is high enough to > >> worry about. > > > > > > ok. i think we'll take a two-pronged approach. for the immediate purposes > > of trying to solve an issue we've begun encountering we will begin > > thoroughtesting the load between various operations in the master-slave > > setup we've set up. pending the results, we can roll forward w a > temporary > > patch in which all end-user touch points route through the primary box > for > > read/write while large scale operations/processing we do in the > background > > will point to the ELB the slaves are sitting behind. we'll also begin > > setting up a simple solrcloud instance to toy with per your suggestion > > above. inb4 tons more questions on my part :) > > > > thanks! > > > > -- > > John Blythe > > > > On Tue, Apr 10, 2018 at 11:14 AM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > >> bq: should we try to bite the solrcloud bullet and be done w it > >> > >> that's what I'd do. As of 7.0 there are different "flavors", TLOG, > >> PULL and NRT so that's also a possibility, although you can't (yet) > >> direct queries to one or the other. So just making them all NRT and > >> forgetting about it is reasonable. > >> > >> bq: is there some more config work we could put in place to avoid ... > >> commit issue and the ultra large merge dangers > >> > >> No. The very nature of merging is such that you will _always_ get > >> large merges until you have 5G segments (by default). The max segment > >> size (outside "optimize/forceMerge/expungeDeletes" which you shouldn't > >> do) is 5G so the steady-state worst-case segment pull is limited to > >> that. > >> > >> bq: maybe for our initial need we use Master for writing and user > >> access in NRT events, but slaves for the heavier backend > >> > >> Quite possible, but you have to route things yourself. But in that > >> case you're limited to one machine to handle all your NRT traffic. I > >> skimmed your post so don't know whether your NRT traffic load is high > >> enough to worry about. > >> > >> The very first thing I'd do is set up a simple SolrCloud setup and > >> give it a spin. Unless your indexing load is quite heavy, the added > >> work the NRT replicas have in SolrCloud isn't a problem so worrying > >> about that is premature optimization unless you have a heavy load..... > >> > >> Best, > >> Erick > >> > >> On Mon, Apr 9, 2018 at 4:36 PM, John Blythe <johnbly...@gmail.com> > wrote: > >> > Thanks a bunch for the thorough reply, Shawn. > >> > > >> > Phew. We’d chosen to go w Master-slave replication instead of > SolrCloud > >> per > >> > the sudden need we had encountered and the desire to avoid the nuances > >> and > >> > changes related to moving to SolrCloud. But so much for this being a > more > >> > straightforward solution, huh? > >> > > >> > Few questions: > >> > - should we try to bite the solrcloud bullet and be done w it? > >> > - is there some more config work we could put in place to avoid the > soft > >> > commit issue and the ultra large merge dangers, keeping the > replications > >> > happening quickly? > >> > - maybe for our initial need we use Master for writing and user > access in > >> > NRT events, but slaves for the heavier backend processing. Thoughts? > >> > - anyone do consulting on this that would be interested in chatting? > >> > > >> > Thanks again! > >> > > >> > On Mon, Apr 9, 2018 at 18:18 Shawn Heisey <apa...@elyograg.org> > wrote: > >> > > >> >> On 4/9/2018 12:15 PM, John Blythe wrote: > >> >> > we're starting to dive into master/slave replication architecture. > >> we'll > >> >> > have 1 master w 4 slaves behind it. our app is NRT. if user > performs > >> an > >> >> > action in section A's data they may choose to jump to section B > which > >> >> will > >> >> > be dependent on having the updates from their action in section A. > as > >> >> such, > >> >> > we're thinking that the replication time should be set to 1-2s (the > >> >> chances > >> >> > of them arriving at section B quickly enough to catch the 2s gap is > >> >> highly > >> >> > unlikely at best). > >> >> > >> >> Once you start talking about master-slave replication, my assumption > is > >> >> that you're not running SolrCloud. You would NOT want to try and mix > >> >> SolrCloud with replication. The features do not play well together. > >> >> SolrCloud with NRT replicas (this is the only replica type that > exists > >> >> in 6.x and earlier) may be a better option than master-slave > >> replication. > >> >> > >> >> > since the replicas will simply be looking for new files it seems > like > >> >> this > >> >> > would be a lightweight operation even every couple seconds for 4 > >> >> replicas. > >> >> > that said, i'm going *entirely* off of assumption at this point and > >> >> wanted > >> >> > to check in w you all to see any nuances, gotchas, hidden > landmines, > >> etc. > >> >> > that we should be considering before rolling things out. > >> >> > >> >> Most of the time, you'd be correct to think that indexing is going to > >> >> create a new small segment and replication will have little work to > do. > >> >> But as you create more and more segments, eventually Lucene is going > to > >> >> start merging those segments. For discussion purposes, I'm going to > >> >> describe a situation where each new segment during indexing is about > >> >> 100KB in size, and the merge policy is left at the default settings. > >> >> I'm also going to assume that no documents are getting deleted or > >> >> reindexed (which will delete the old version). Deleted documents can > >> >> have an impact on merging, but it will usually only be a dramatic > impact > >> >> if there are a LOT of deleted documents. > >> >> > >> >> The first ten segments created will be this 100KB size. Then Lucene > is > >> >> going to see that there are enough segments to trigger the merge > policy > >> >> - it's going to combine ten of those segments into one that's > >> >> approximately one megabyte. Repeat this ten times, and ten of those > 1 > >> >> megabyte segments will be combined into one ten megabyte segment. > >> >> Repeat all of THAT ten times, and there will be a 100 megabyte > segment. > >> >> And there will eventually be another level creating 1 gigabyte > >> >> segments. If the index is below 5GB in size, the entire thing > *could* > >> >> be merged into one segment by this process. > >> >> > >> >> The end result of all this: Replication is not always going to be > >> >> super-quick. If merging creates a 1 gigabyte segment, then the > amount > >> >> of time to transfer that new segment is going to depend on how fast > your > >> >> disks are, and how fast your network is. If you're using commodity > SATA > >> >> drives in the 4 to 10 terabyte range and a gigabit network, the > network > >> >> is probably going to be the bottleneck -- assuming that the system > has > >> >> plenty of memory and isn't under a high load. If the network is the > >> >> bottleneck in that situation, it's probably going to take close to > ten > >> >> seconds to transfer a 1GB segment, and the greater part of a minute > to > >> >> transfer a 5GB segment, which is the biggest one that the default > merge > >> >> policy configuration will create without an optimize operation. > >> >> > >> >> Also, you should understand something that has come to my attention > >> >> recently (and is backed up by documentation): If the master does a > soft > >> >> commit and the segment that was committed remains in memory (not > flushed > >> >> to disk), that segment will NOT be replicated to the slaves. It has > to > >> >> get flushed to disk before it can be replicated. > >> >> > >> >> Thanks, > >> >> Shawn > >> >> > >> >> -- > >> > John Blythe > >> >