Re: replication

John Blythe Fri, 13 Apr 2018 04:43:24 -0700

great. thanks, erick!

--
John Blythe


On Wed, Apr 11, 2018 at 12:16 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> bq: are you simply flagging the fact that we wouldn't direct the queries
> to A
> v. B v. C since SolrCloud will make the decisions itself as to which part
> of the distro gets hit for the operation
>
> Yep. SolrCloud takes care of it all itself. I should also add that there
> are
> about a zillion metrics now available in Solr that you can use to make the
> best use of hardware, including things like CPU usage, I/O, GC etc.
> SolrCloud
> doesn't _yet_ make use of these but will in future. The current software LB
> does a pretty simple round-robin distribution.
>
> Best,
> Erick
>
> On Wed, Apr 11, 2018 at 5:57 AM, John Blythe <johnbly...@gmail.com> wrote:
> > thanks, erick. great info.
> >
> > although you can't (yet) direct queries to one or the other. So just
> making
> >> them all NRT and forgetting about it is reasonable.
> >
> >
> > are you simply flagging the fact that we wouldn't direct the queries to A
> > v. B v. C since SolrCloud will make the decisions itself as to which part
> > of the distro gets hit for the operation? if not, can you expound on
> this a
> > bit more?
> >
> > The very nature of merging is such that you will _always_ get large
> merges
> >> until you have 5G segments (by default)
> >
> >
> > bummer
> >
> > Quite possible, but you have to route things yourself. But in that case
> >> you're limited to one machine to handle all your NRT traffic. I skimmed
> >> your post so don't know whether your NRT traffic load is high enough to
> >> worry about.
> >
> >
> > ok. i think we'll take a two-pronged approach. for the immediate purposes
> > of trying to solve an issue we've begun encountering we will begin
> > thoroughtesting the load between various operations in the master-slave
> > setup we've set up. pending the results, we can roll forward w a
> temporary
> > patch in which all end-user touch points route through the primary box
> for
> > read/write while large scale operations/processing we do in the
> background
> > will point to the ELB the slaves are sitting behind. we'll also begin
> > setting up a simple solrcloud instance to toy with per your suggestion
> > above. inb4 tons more questions on my part :)
> >
> > thanks!
> >
> > --
> > John Blythe
> >
> > On Tue, Apr 10, 2018 at 11:14 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> bq: should we try to bite the solrcloud bullet and be done w it
> >>
> >> that's what I'd do. As of 7.0 there are different "flavors", TLOG,
> >> PULL and NRT so that's also a possibility, although you can't (yet)
> >> direct queries to one or the other. So just making them all NRT and
> >> forgetting about it is reasonable.
> >>
> >> bq:  is there some more config work we could put in place to avoid ...
> >> commit issue and the ultra large merge dangers
> >>
> >> No. The very nature of merging is such that you will _always_ get
> >> large merges until you have 5G segments (by default). The max segment
> >> size (outside "optimize/forceMerge/expungeDeletes" which you shouldn't
> >> do) is 5G so the steady-state worst-case segment pull is limited to
> >> that.
> >>
> >> bq: maybe for our initial need we use Master for writing and user
> >> access in NRT events, but slaves for the heavier backend
> >>
> >> Quite possible, but you have to route things yourself. But in that
> >> case you're limited to one machine to handle all your NRT traffic. I
> >> skimmed your post so don't know whether your NRT traffic load is high
> >> enough to worry about.
> >>
> >> The very first thing I'd do is set up a simple SolrCloud setup and
> >> give it a spin. Unless your indexing load is quite heavy, the added
> >> work the NRT replicas have in SolrCloud isn't a problem so worrying
> >> about that is premature optimization unless you have a heavy load.....
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Apr 9, 2018 at 4:36 PM, John Blythe <johnbly...@gmail.com>
> wrote:
> >> > Thanks a bunch for the thorough reply, Shawn.
> >> >
> >> > Phew. We’d chosen to go w Master-slave replication instead of
> SolrCloud
> >> per
> >> > the sudden need we had encountered and the desire to avoid the nuances
> >> and
> >> > changes related to moving to SolrCloud. But so much for this being a
> more
> >> > straightforward solution, huh?
> >> >
> >> > Few questions:
> >> > - should we try to bite the solrcloud bullet and be done w it?
> >> > - is there some more config work we could put in place to avoid the
> soft
> >> > commit issue and the ultra large merge dangers, keeping the
> replications
> >> > happening quickly?
> >> > - maybe for our initial need we use Master for writing and user
> access in
> >> > NRT events, but slaves for the heavier backend processing. Thoughts?
> >> > - anyone do consulting on this that would be interested in chatting?
> >> >
> >> > Thanks again!
> >> >
> >> > On Mon, Apr 9, 2018 at 18:18 Shawn Heisey <apa...@elyograg.org>
> wrote:
> >> >
> >> >> On 4/9/2018 12:15 PM, John Blythe wrote:
> >> >> > we're starting to dive into master/slave replication architecture.
> >> we'll
> >> >> > have 1 master w 4 slaves behind it. our app is NRT. if user
> performs
> >> an
> >> >> > action in section A's data they may choose to jump to section B
> which
> >> >> will
> >> >> > be dependent on having the updates from their action in section A.
> as
> >> >> such,
> >> >> > we're thinking that the replication time should be set to 1-2s (the
> >> >> chances
> >> >> > of them arriving at section B quickly enough to catch the 2s gap is
> >> >> highly
> >> >> > unlikely at best).
> >> >>
> >> >> Once you start talking about master-slave replication, my assumption
> is
> >> >> that you're not running SolrCloud.  You would NOT want to try and mix
> >> >> SolrCloud with replication.  The features do not play well together.
> >> >> SolrCloud with NRT replicas (this is the only replica type that
> exists
> >> >> in 6.x and earlier) may be a better option than master-slave
> >> replication.
> >> >>
> >> >> > since the replicas will simply be looking for new files it seems
> like
> >> >> this
> >> >> > would be a lightweight operation even every couple seconds for 4
> >> >> replicas.
> >> >> > that said, i'm going *entirely* off of assumption at this point and
> >> >> wanted
> >> >> > to check in w you all to see any nuances, gotchas, hidden
> landmines,
> >> etc.
> >> >> > that we should be considering before rolling things out.
> >> >>
> >> >> Most of the time, you'd be correct to think that indexing is going to
> >> >> create a new small segment and replication will have little work to
> do.
> >> >> But as you create more and more segments, eventually Lucene is going
> to
> >> >> start merging those segments.  For discussion purposes, I'm going to
> >> >> describe a situation where each new segment during indexing is about
> >> >> 100KB in size, and the merge policy is left at the default settings.
> >> >> I'm also going to assume that no documents are getting deleted or
> >> >> reindexed (which will delete the old version).  Deleted documents can
> >> >> have an impact on merging, but it will usually only be a dramatic
> impact
> >> >> if there are a LOT of deleted documents.
> >> >>
> >> >> The first ten segments created will be this 100KB size.  Then Lucene
> is
> >> >> going to see that there are enough segments to trigger the merge
> policy
> >> >> - it's going to combine ten of those segments into one that's
> >> >> approximately one megabyte.  Repeat this ten times, and ten of those
> 1
> >> >> megabyte segments will be combined into one ten megabyte segment.
> >> >> Repeat all of THAT ten times, and there will be a 100 megabyte
> segment.
> >> >> And there will eventually be another level creating 1 gigabyte
> >> >> segments.  If the index is below 5GB in size, the entire thing
> *could*
> >> >> be merged into one segment by this process.
> >> >>
> >> >> The end result of all this:  Replication is not always going to be
> >> >> super-quick.  If merging creates a 1 gigabyte segment, then the
> amount
> >> >> of time to transfer that new segment is going to depend on how fast
> your
> >> >> disks are, and how fast your network is.  If you're using commodity
> SATA
> >> >> drives in the 4 to 10 terabyte range and a gigabit network, the
> network
> >> >> is probably going to be the bottleneck -- assuming that the system
> has
> >> >> plenty of memory and isn't under a high load.  If the network is the
> >> >> bottleneck in that situation, it's probably going to take close to
> ten
> >> >> seconds to transfer a 1GB segment, and the greater part of a minute
> to
> >> >> transfer a 5GB segment, which is the biggest one that the default
> merge
> >> >> policy configuration will create without an optimize operation.
> >> >>
> >> >> Also, you should understand something that has come to my attention
> >> >> recently (and is backed up by documentation):  If the master does a
> soft
> >> >> commit and the segment that was committed remains in memory (not
> flushed
> >> >> to disk), that segment will NOT be replicated to the slaves.  It has
> to
> >> >> get flushed to disk before it can be replicated.
> >> >>
> >> >> Thanks,
> >> >> Shawn
> >> >>
> >> >> --
> >> > John Blythe
> >>
>

Re: replication

Reply via email to